Crimes in Chicago

This work aims to uncover spatial and temporal crime patterns in Chicago. The study of spatial and temporal crime patterns is essential since it helps advance academic understanding of criminal activities and provides insight into criminal events, which benefits optimizing police presence and public safety. This work focuses on investigating and visualizing crime patterns in Chicago based on the unit of community. How do spatial crime patterns differ across crime types, and how do spatial crime patterns change across different time scales (e.g., day vs. night, month, year)? This notebook walks through the whole project from collecting data to generating results.

In [97]:
# import required libraries
%matplotlib inline
import os
import fiona

import pprint
import IPython
from matplotlib import pyplot as plt

import pandas as pd
import geopandas as gpd
import folium
from folium.plugins import HeatMap, HeatMapWithTime

File Structure

project/
│   Project Crimes in Chicago.ipynb  
│
└───data/
│   │   Community.zip 
│   │   chicago_pop.csv
│   │   rows.csv
│   │
└───result/
    │   ...

Data Decription

We need three data for this project:

  1. rows.csv (criminal events in Chicago): data provided by city of Chicago contains details on all Chicago criminal events since 2001, including date, time, crime types, district, community, location coordinates, and location description (e.g., store, apartment).
  2. Community.zip (community boundaries in the Chicago city): use this link download it and put it the folder: https://data.cityofchicago.org/api/geospatial/cauq-8yn6?method=export&format=Shapefile
  3. chicago_pop.csv (population data for each community in chicago): This contains population data for year 2000, 2010, 2020. I collect it myself, please contact weih9@illinois.edu if you need it.
In [ ]:
# Run this line at the first time to get chicago crime data
! wget -P ./data/ https://data.cityofchicago.org/api/views/ijzp-q8t2/rows.csv
In [3]:
# Read data
crimes = pd.read_csv('data/rows.csv')
community_geo = gpd.read_file(os.path.join(r'data/Community.zip'))[['area_numbe', 'community', 'geometry']] 
community_pop = pd.read_csv('data/chicago_pop.csv').iloc[:,0:5]
In [326]:
crimes.head(3)
Out[326]:
ID Case Number Date Block IUCR Primary Type Description Location Description Arrest Domestic ... Ward Community Area FBI Code X Coordinate Y Coordinate Year Updated On Latitude Longitude Location
0 10224738 HY411648 09/05/2015 01:30:00 PM 043XX S WOOD ST 0486 BATTERY DOMESTIC BATTERY SIMPLE RESIDENCE False True ... 12.0 61.0 08B 1165074.0 1875917.0 2015 02/10/2018 03:50:01 PM 41.815117 -87.6700 (41.815117282, -87.669999562)
1 10224739 HY411615 09/04/2015 11:30:00 AM 008XX N CENTRAL AVE 0870 THEFT POCKET-PICKING CTA BUS False False ... 29.0 25.0 06 1138875.0 1904869.0 2015 02/10/2018 03:50:01 PM 41.895080 -87.7654 (41.895080471, -87.765400451)
2 11646166 JC213529 09/01/2018 12:01:00 AM 082XX S INGLESIDE AVE 0810 THEFT OVER $500 RESIDENCE False True ... 8.0 44.0 06 NaN NaN 2018 04/06/2019 04:04:43 PM NaN NaN NaN

3 rows × 22 columns

In [327]:
community_geo.head(3)
Out[327]:
Community Area community geometry Community Point community_x community_y
0 35 DOUGLAS POLYGON ((-87.60914 41.84469, -87.60915 41.844... POINT (-87.61868 41.83512) -87.618678 41.835118
1 36 OAKLAND POLYGON ((-87.59215 41.81693, -87.59231 41.816... POINT (-87.60322 41.82375) -87.603216 41.823750
2 37 FULLER PARK POLYGON ((-87.62880 41.80189, -87.62879 41.801... POINT (-87.63242 41.80909) -87.632425 41.809085
In [328]:
community_pop.head(3)
Out[328]:
no name pop_2020 pop_2010 pop_2000
0 1 Rogers Park 55628 54991 63484
1 2 West Ridge 77122 71942 73199
2 3 Uptown 57182 56362 63551

Data Cleaning and Pre-processing

In [44]:
def cleandata(dataset):
    
    # necessary information in the columns
    columns_tokeep = ['Date','Year','Primary Type','Community Area','X Coordinate',
                      'Y Coordinate','Latitude', 'Longitude']
    dataset = dataset[columns_tokeep]
    
    # columns who should not have nan values
    columns_dropnan = ['Date','Year','Primary Type','Community Area','X Coordinate',
                      'Y Coordinate','Latitude', 'Longitude']
    dataset = dataset.dropna(how='any')
    
    # change dtype of columns
    dataset['Date'] = pd.to_datetime(dataset['Date'])
    dataset['Community Area'] = dataset['Community Area'].astype(int)
    
    # time analysis
    # 6 - 18 - Day
    # 0 - 6; 18 - 0 - Night
    dataset['Month'] = dataset['Date'].dt.month
    dataset['DayNight'] = dataset['Date'].dt.hour // 6 
    dataset['DayNight'] = dataset['DayNight'].replace({0: 'Night', 1: 'Day',
                                                       2: 'Day', 3: 'Night'})
    
    # Primary Types re-catergory
    property_loss_list = ['THEFT','BURGLARY', 'MOTOR VEHICLE THEFT', 'DECEPTIVE PRACTICE']
    safety_list = ['BATTERY', 'WEAPONS VIOLATION', 'CRIMINAL DAMAGE', 'ASSAULT', 
                   'ROBBERY', 'SEX OFFENSE', 'CRIM SEXUAL ASSAULT', 'ARSON',
                   'HOMICIDE', 'KIDNAPPING', 'CRIMINAL SEXUAL ASSAULT', 'INTIMIDATION',
                   'STALKING', 'CONCEALED CARRY LICENSE VIOLATION', 'PUBLIC INDECENCY',
                   'HUMAN TRAFFICKING', 'DOMESTIC VIOLENCE']
    others_list = ['NARCOTICS', 'OTHER OFFENSE', 'CRIMINAL TRESPASS', 'PROSTITUTION', 
                   'OFFENSE INVOLVING CHILDREN', 'PUBLIC PEACE VIOLATION', 
                   'INTERFERENCE WITH PUBLIC OFFICER', 'LIQUOR LAW VIOLATION', 
                   'GAMBLING', 'OBSCENITY', 'NON-CRIMINAL', 'OTHER NARCOTIC VIOLATION',
                   'NON - CRIMINAL', 'NON - CRIMINAL', 'RITUALISM', 'NON-CRIMINAL (SUBJECT SPECIFIED)'] 
    
    dataset = dataset.replace({'Primary Type': dict.fromkeys(property_loss_list, 'Property')})
    dataset = dataset.replace({'Primary Type': dict.fromkeys(safety_list, 'Safety')})
    dataset = dataset.replace({'Primary Type': dict.fromkeys(others_list, 'Others')})

    return dataset
In [47]:
crimes_clean = cleandata(crimes)
In [329]:
# 6882748 after clean
crimes_clean.head(3)
Out[329]:
Date Year Primary Type Community Area X Coordinate Y Coordinate Latitude Longitude Month DayNight
0 2015-09-05 13:30:00 2015 Safety 61 1165074.0 1875917.0 41.815117 -87.67000 9 Day
1 2015-09-04 11:30:00 2015 Property 25 1138875.0 1904869.0 41.895080 -87.76540 9 Day
3 2015-09-05 12:45:00 2015 Others 21 1152037.0 1920384.0 41.937406 -87.71665 9 Day
In [111]:
crimes_count = crimes_clean.groupby(['Year','Month','DayNight','Primary Type','Community Area']).agg({'Date':'count'}).reset_index()
crimes_count = crimes_count.rename(columns={'Date': 'Count'})
In [112]:
crimes_count.head()
Out[112]:
Year Month DayNight Primary Type Community Area Count
0 2001 1 Day Others 4 1
1 2001 1 Day Others 7 1
2 2001 1 Day Others 8 1
3 2001 1 Day Others 13 1
4 2001 1 Day Others 15 1
In [ ]:
# Here, just want a point represent for each community, thus I calculate on
# geographic CRS directly, a right way should re-project geometries to a 
# projected CRS before this operation 
community_geo['Community Point'] = community_geo.centroid
community_geo['community_x'] = community_geo['Community Point'].x
community_geo['community_y'] = community_geo['Community Point'].y
community_geo = community_geo.rename(columns={'area_numbe': 'Community Area'})
community_geo['Community Area'] = community_geo['Community Area'].astype(int)
In [330]:
community_geo.head(3)
Out[330]:
Community Area community geometry Community Point community_x community_y
0 35 DOUGLAS POLYGON ((-87.60914 41.84469, -87.60915 41.844... POINT (-87.61868 41.83512) -87.618678 41.835118
1 36 OAKLAND POLYGON ((-87.59215 41.81693, -87.59231 41.816... POINT (-87.60322 41.82375) -87.603216 41.823750
2 37 FULLER PARK POLYGON ((-87.62880 41.80189, -87.62879 41.801... POINT (-87.63242 41.80909) -87.632425 41.809085

Crime Heat Map with Time

In [294]:
def CrimesHeatMapWithTime(crimes_count, community_geo, bytime, dayornight = None, crimetype = None):

    if bytime not in ['Year', 'Month']:
        return 0
    if dayornight is not None:
        if dayornight == 'Day' or dayornight == 'Night':
            crimes_count = crimes_count.loc[crimes_count['DayNight'] == dayornight]
        else:
            return 0
    if crimetype is not None:
        if crimetype in ['Property', 'Safety', 'Others']:
            crimes_count = crimes_count.loc[crimes_count['Primary Type'] == crimetype]
        else:
            return 0
    
    crimes_count_bytime = crimes_count.groupby([bytime,'Community Area']).agg({'Count':'sum'}).reset_index()
    time_index = list(crimes_count_bytime[bytime].sort_values().astype('str').unique())
    crimes_count_bytime = crimes_count_bytime.merge(community_geo, on='Community Area')
    crimes_count_bytime = crimes_count_bytime.sort_values(by = [bytime,'Community Area'], ascending=True)
    
    data = []
    for _, d in crimes_count_bytime.groupby(bytime):
        data.append([[row['community_y'], row['community_x'], row['Count']] for _, row in d.iterrows()])

    heatmap_layer = HeatMapWithTime(data,
                index=time_index,
                auto_play=True,
                use_local_extrema=True
               )

    return heatmap_layer

Crimes change by year

In [315]:
heatmap = folium.Map(location=[41.87, -87.62], # Chicago
               tiles='stamentoner',#'cartodbpositron', stamentoner
               zoom_start=10.2,
               control_scale=True)
# # Add communitiy boundry
# folium.GeoJson(
#     data=community_geo['geometry'], 
#     style_function = lambda x: {'fillOpacity' : 0, 'weight': 1, 'Opacity' : 0.1}
#     ).add_to(heatmap)

CrimesHeatMapWithTime(crimes_count, community_geo, 'Year').add_to(heatmap)

heatmap
# heatmap.save('result/year.html')
Out[315]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Crimes change by month

In [316]:
heatmap = folium.Map(location=[41.87, -87.62], # Chicago
               tiles='stamentoner',#'cartodbpositron', stamentoner
               zoom_start=10.2,
               control_scale=True)
CrimesHeatMapWithTime(crimes_count, community_geo, 'Month').add_to(heatmap)
# heatmap.save('result/month.html')
heatmap
Out[316]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Day v.s. Night Crimes change by year

In [320]:
daynightmap = folium.plugins.DualMap(location=[41.8, -87.62], tiles='stamentoner', zoom_start=10)

CrimesHeatMapWithTime(crimes_count, community_geo, 'Year', dayornight='Day').add_to(daynightmap.m1)
CrimesHeatMapWithTime(crimes_count, community_geo, 'Year', dayornight='Night').add_to(daynightmap.m2)
# daynightmap.save('result/daynight.html')
daynightmap
Out[320]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Changes by year among crime types

In [321]:
crimetypemap = folium.plugins.DualMap(location=[41.8, -87.62], tiles='stamentoner', zoom_start=10)
CrimesHeatMapWithTime(crimes_count, community_geo, 'Year', crimetype = 'Safety').add_to(crimetypemap.m1)
CrimesHeatMapWithTime(crimes_count, community_geo, 'Year', crimetype = 'Property').add_to(crimetypemap.m2)
# crimetypemap.save('result/crimetype.html')
crimetypemap
Out[321]:
Make this Notebook Trusted to load map: File -> Trust Notebook

Cirmes Considering Population

In [304]:
def find_pop(community_pop, row):
    index = (row['Year']-1996)//10 + 1
    pop = community_pop[community_pop['no'] == row['Community Area']].iloc[:,-index]
    return pop

def CrimesDenseHeatMapWithTime(crimes_count, community_geo, bytime, dayornight = None, crimetype = None):

    if bytime not in ['Year', 'Month']:
        return 0
    if dayornight is not None:
        if dayornight == 'Day' or dayornight == 'Night':
            crimes_count = crimes_count.loc[crimes_count['DayNight'] == dayornight]
        else:
            return 0
    if crimetype is not None:
        if crimetype in ['Property', 'Safety', 'Others']:
            crimes_count = crimes_count.loc[crimes_count['Primary Type'] == crimetype]
        else:
            return 0
    
    crimes_count_bytime = crimes_count.groupby([bytime,'Community Area']).agg({'Count':'sum'}).reset_index()
    time_index = list(crimes_count_bytime[bytime].sort_values().astype('str').unique())
    crimes_count_bytime = crimes_count_bytime.merge(community_geo, on='Community Area')
    crimes_count_bytime = crimes_count_bytime.sort_values(by = [bytime,'Community Area'], ascending=True)
    # divided by population
    crimes_count_bytime['pop'] = crimes_count_bytime.apply(lambda row: find_pop(community_pop, row), axis=1).sum(axis=1)
    crimes_count_bytime['count_pop'] = crimes_count_bytime['Count']/crimes_count_bytime['pop']
    
    data = []
    for _, d in crimes_count_bytime.groupby(bytime):
        data.append([[row['community_y'], row['community_x'], row['count_pop']] for _, row in d.iterrows()])

    heatmap_layer = HeatMapWithTime(data,
                index=time_index,
                auto_play=True,
                use_local_extrema=True
               )

    return heatmap_layer
In [322]:
heatmap = folium.Map(location=[41.87, -87.62], # Chicago
               tiles='stamentoner',#'cartodbpositron', stamentoner
               zoom_start=10.2,
               control_scale=True)

CrimesDenseHeatMapWithTime(crimes_count, community_geo, 'Year').add_to(heatmap)

# heatmap.save('result/yearpop.html')
heatmap
Out[322]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [323]:
popmap = folium.plugins.DualMap(location=[41.8, -87.62], tiles='stamentoner', zoom_start=10)

CrimesHeatMapWithTime(crimes_count, community_geo, 'Year').add_to(popmap.m1)
CrimesDenseHeatMapWithTime(crimes_count, community_geo, 'Year').add_to(popmap.m2)

# popmap.save('result/yearpopcom.html')
popmap
Out[323]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [324]:
popcrimetypemap = folium.plugins.DualMap(location=[41.8, -87.62], tiles='stamentoner', zoom_start=10)
CrimesDenseHeatMapWithTime(crimes_count, community_geo, 'Year', crimetype = 'Safety').add_to(popcrimetypemap.m1)
CrimesDenseHeatMapWithTime(crimes_count, community_geo, 'Year', crimetype = 'Property').add_to(popcrimetypemap.m2)
# popcrimetypemap.save('result/crimetypepop.html')
popcrimetypemap
Out[324]:
Make this Notebook Trusted to load map: File -> Trust Notebook
In [ ]: